Japanese Pronunciation Prediction as Phrasal Statistical Machine Translation

نویسندگان

  • Jun Hatori
  • Hisami Suzuki
چکیده

This paper addresses the problem of predicting the pronunciation of Japanese text. The difficulty of this task lies in the high degree of ambiguity in the pronunciation of Japanese characters and words. Previous approaches have either considered the task as a word-level classification problem based on a dictionary, which does not fare well in handling out-of-vocabulary (OOV) words; or solely focused on the pronunciation prediction of OOV words without considering the contextual disambiguation of word pronunciations in text. In this paper, we propose a unified approach within the framework of phrasal statistical machine translation (SMT) that combines the strengths of the dictionary-based and substring-based approaches. Our approach is novel in that we combine wordand character-based pronunciations from a dictionary within an SMT framework: the former captures the idiosyncratic properties of word pronunciation, while the latter provides the flexibility to predict the pronunciation of OOV words. We show that based on an extensive evaluation on various test sets, our model significantly outperforms the previous state-of-the-art systems, achieving around 90% accuracy in most domains.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Predicting Word Pronunciation in Japanese

This paper addresses the problem of predicting the pronunciation of Japanese words, especially those that are newly created and therefore not in the dictionary. This is an important task for many applications including text-to-speech and text input method, and is also challenging, because Japanese kanji (ideographic) characters typically have multiple possible pronunciations. We approach this p...

متن کامل

Proper Name Machine Translation from Japanese to Japanese Sign Language

This paper describes machine translation of proper names from Japanese to Japanese Sign Language (JSL). “Proper name transliteration” is a kind of machine translation of proper names between spoken languages and involves character-tocharacter conversion based on pronunciation. However, transliteration methods cannot be applied to Japanese-JSL machine translation because proper names in JSL are ...

متن کامل

Exploiting Phrasal Lexica and Additional Morpho-syntactic Language Resources for Statistical Machine Translation with Scarce Training Data

In this work, the use of a phrasal lexicon for statistical machine translation is proposed, and the relation between data acquisition costs and translation quality for different types and sizes of language resources has been analyzed. The language pairs are Spanish-English and Catalan-English, and the translation is performed in all directions. The phrasal lexicon is used to increase as well as...

متن کامل

Phrasal: A Toolkit for New Directions in Statistical Machine Translation

We present a new version of Phrasal, an open-source toolkit for statistical phrasebased machine translation. This revision includes features that support emerging research trends such as (a) tuning with large feature sets, (b) tuning on large datasets like the bitext, and (c) web-based interactive machine translation. A direct comparison with Moses shows favorable results in terms of decoding s...

متن کامل

Phrasal Segmentation Models for Statistical Machine Translation

Phrasal segmentation models define a mapping from the words of a sentence to sequences of translatable phrases. We discuss the estimation of these models from large quantities of monolingual training text and describe their realization as weighted finite state transducers for incorporation into phrase-based statistical machine translation systems. Results are reported on the NIST Arabic-English...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011